AITopics | Edirne Province

Collaborating Authors

Edirne Province

bioLeak: Leakage-Aware Modeling and Diagnostics for Machine Learning in R

arXiv.org Machine LearningApr-14-2026

Data leakage remains a recurrent source of optimistic bias in biomedical machine learning studies. Standard row-wise cross-validation and globally estimated preprocessing steps are often inappropriate for data with repeated measurements, study-level heterogeneity, batch effects, or temporal dependencies. This paper describes bioLeak, an R package for constructing leakage-aware resampling workflows and for auditing fitted models for common leakage mechanisms. The package provides leakage-aware split construction, train-fold-only preprocessing, cross-validated model fitting, nested hyperparameter tuning, post hoc leakage audits, and HTML reporting. The implementation supports binary classification, multiclass classification, regression, and survival analysis, with task-specific metrics and S4 containers for splits, fits, audits, and inflation summaries. The simulation artifacts show how apparent performance changes under controlled leakage mechanisms, and the case study illustrates how guarded and leaky pipelines can yield materially different conclusions on multi-study transcriptomic data. The emphasis throughout is on software design, reproducible workflows, and interpretation of diagnostic output.

artificial intelligence, leakage, machine learning, (18 more...)

arXiv.org Machine Learning

2604.10965

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Edirne Province > Edirne (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)
Workflow (0.90)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.66)

Add feedback

fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

Korkmaz, Selcuk, Goksuluk, Dincer, Karaismailoglu, Eda

arXiv.org Machine LearningApr-14-2026

Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.

artificial intelligence, fastml, machine learning, (19 more...)

arXiv.org Machine Learning

2604.05225

Country:

Europe > Netherlands > South Holland > Rotterdam (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

The Transition from Centralized Machine Learning to Federated Learning for Mental Health in Education: A Survey of Current Methods and Future Directions

Ebrahimi, Maryam, Sahay, Rajeev, Hosseinalipour, Seyyedali, Akram, Bita

arXiv.org Artificial IntelligenceJan-20-2025

Research has increasingly explored the application of artificial intelligence (AI) and machine learning (ML) within the mental health domain to enhance both patient care and healthcare provider efficiency. Given that mental health challenges frequently emerge during early adolescence -- the critical years of high school and college -- investigating AI/ML-driven mental health solutions within the education domain is of paramount importance. Nevertheless, conventional AI/ML techniques follow a centralized model training architecture, which poses privacy risks due to the need for transferring students' sensitive data from institutions, universities, and clinics to central servers. Federated learning (FL) has emerged as a solution to address these risks by enabling distributed model training while maintaining data privacy. Despite its potential, research on applying FL to analyze students' mental health remains limited. In this paper, we aim to address this limitation by proposing a roadmap for integrating FL into mental health data analysis within educational settings. We begin by providing an overview of mental health issues among students and reviewing existing studies where ML has been applied to address these challenges. Next, we examine broader applications of FL in the mental health domain to emphasize the lack of focus on educational contexts. Finally, we propose promising research directions focused on using FL to address mental health issues in the education sector, which entails discussing the synergies between the proposed directions with broader human-centered domains. By categorizing the proposed research directions into short- and long-term strategies and highlighting the unique challenges at each stage, we aim to encourage the development of privacy-conscious AI/ML-driven mental health solutions.

artificial intelligence, machine learning, student, (16 more...)

arXiv.org Artificial Intelligence

2501.11714

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > North Carolina (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Multi-objective Combinatorial Methodology for Nuclear Reactor Site Assessment: A Case Study for the United States

Erdem, Omer, Daley, Kevin, Hoelzle, Gabrielle, Radaideh, Majdi I.

arXiv.org Artificial IntelligenceDec-11-2024

As the global demand for clean energy intensifies to achieve sustainability and net-zero carbon emission goals, nuclear energy stands out as a reliable solution. However, fully harnessing its potential requires overcoming key challenges, such as the high capital costs associated with nuclear power plants (NPPs). One promising strategy to mitigate these costs involves repurposing sites with existing infrastructure, including coal power plant (CPP) locations, which offer pre-built facilities and utilities. Additionally, brownfield sites - previously developed or underutilized lands often impacted by industrial activity - present another compelling alternative. These sites typically feature valuable infrastructure that can significantly reduce the costs of NPP development. This study introduces a novel multi-objective optimization methodology, leveraging combinatorial search to evaluate over 30,000 potential NPP sites in the United States. Our approach addresses gaps in the current practice of assigning pre-determined weights to each site attribute that could lead to bias in the ranking. Each site is assigned a performance-based score, derived from a detailed combinatorial analysis of its site attributes. The methodology generates a comprehensive database comprising site locations (inputs), attributes (outputs), site score (outputs), and the contribution of each attribute to the site score (outputs). We then use this database to train a machine learning neural network model, enabling rapid predictions of nuclear siting suitability across any location in the contiguous United States.

combination length, dataset, objective, (13 more...)

arXiv.org Artificial Intelligence

2412.08878

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Florida > St. Lucie County (0.05)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
(17 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LegalTurk Optimized BERT for Multi-Label Text Classification and NER

Zeidi, Farnaz, Amasyali, Mehmet Fatih, Erol, Çiğdem

arXiv.org Artificial IntelligenceJun-30-2024

The introduction of the Transformer neural network, along with techniques like self-supervised pre-training and transfer learning, has paved the way for advanced models like BERT. Despite BERT's impressive performance, opportunities for further enhancement exist. To our knowledge, most efforts are focusing on improving BERT's performance in English and in general domains, with no study specifically addressing the legal Turkish domain. Our study is primarily dedicated to enhancing the BERT model within the legal Turkish domain through modifications in the pre-training phase. In this work, we introduce our innovative modified pre-training approach by combining diverse masking strategies. In the fine-tuning task, we focus on two essential downstream tasks in the legal domain: name entity recognition and multi-label text classification. To evaluate our modified pre-training approach, we fine-tuned all customized models alongside the original BERT models to compare their performance. Our modified approach demonstrated significant improvements in both NER and multi-label text classification tasks compared to the original BERT model. Finally, to showcase the impact of our proposed models, we trained our best models with different corpus sizes and compared them with BERTurk models. The experimental results demonstrate that our innovative approach, despite being pre-trained on a smaller corpus, competes with BERTurk.

corpus, multi-label text classification, text classification, (12 more...)

arXiv.org Artificial Intelligence

2407.00648

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Iran (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking

Acikgoz, Emre Can, Erdogan, Mete, Yuret, Deniz

arXiv.org Artificial IntelligenceMay-7-2024

Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages. This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations, with a special focus on Turkish. We conduct an in-depth analysis to evaluate the impact of training strategies, model choices, and data availability on the performance of LLMs designed for underrepresented languages. Our approach includes two methodologies: (i) adapting existing LLMs originally pretrained in English to understand Turkish, and (ii) developing a model from the ground up using Turkish pretraining data, both supplemented with supervised fine-tuning on a novel Turkish instruction-tuning dataset aimed at enhancing reasoning capabilities. The relative performance of these methods is evaluated through the creation of a new leaderboard for Turkish LLMs, featuring benchmarks that assess different reasoning and knowledge skills. Furthermore, we conducted experiments on data and model scaling, both during pretraining and fine-tuning, simultaneously emphasizing the capacity for knowledge transfer across languages and addressing the challenges of catastrophic forgetting encountered during fine-tuning on a different language. Our goal is to offer a detailed guide for advancing the LLM framework in low-resource linguistic contexts, thereby making natural language processing (NLP) benefits more globally accessible.

bridging, dataset, semanticscholar, (14 more...)

arXiv.org Artificial Intelligence

2405.04685

Country:

North America > United States (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(13 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Context-dependent Explainability and Contestability for Trustworthy Medical Artificial Intelligence: Misclassification Identification of Morbidity Recognition Models in Preterm Infants

Guzey, Isil, Ucar, Ozlem, Ciftdemir, Nukhet Aladag, Acunas, Betul

arXiv.org Artificial IntelligenceDec-17-2022

Although machine learning (ML) models of AI achieve high performances in medicine, they are not free of errors. Empowering clinicians to identify incorrect model recommendations is crucial for engendering trust in medical AI. Explainable AI (XAI) aims to address this requirement by clarifying AI reasoning to support the end users. Several studies on biomedical imaging achieved promising results recently. Nevertheless, solutions for models using tabular data are not sufficient to meet the requirements of clinicians yet. This paper proposes a methodology to support clinicians in identifying failures of ML models trained with tabular data. We built our methodology on three main pillars: decomposing the feature set by leveraging clinical context latent space, assessing the clinical association of global explanations, and Latent Space Similarity (LSS) based local explanations. We demonstrated our methodology on ML-based recognition of preterm infant morbidities caused by infection. The risk of mortality, lifelong disability, and antibiotic resistance due to model failures was an open research question in this domain. We achieved to identify misclassification cases of two models with our approach. By contextualizing local explanations, our solution provides clinicians with actionable insights to support their autonomy for informed final decisions.

infant, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.08821

Country:

Asia > Middle East > Republic of Türkiye (0.04)
Europe > Middle East > Republic of Türkiye > Edirne Province > Edirne (0.04)
North America > United States > Iowa (0.04)
Europe > United Kingdom (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Public Health (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
(2 more...)

Add feedback

Structure Aware Negative Sampling in Knowledge Graphs

Ahrabian, Kian, Feizi, Aarash, Salehi, Yasmin, Hamilton, William L., Bose, Avishek Joey

arXiv.org Machine LearningOct-6-2020

Learning low-dimensional representations for entities and relations in knowledge graphs using contrastive estimation represents a scalable and effective method for inferring connectivity patterns. A crucial aspect of contrastive learning approaches is the choice of corruption distribution that generates hard negative samples, which force the embedding model to learn discriminative representations and find critical characteristics of observed data. While earlier methods either employ too simple corruption distributions, i.e. uniform, yielding easy uninformative negatives or sophisticated adversarial distributions with challenging optimization schemes, they do not explicitly incorporate known graph structure resulting in suboptimal negatives. In this paper, we propose Structure Aware Negative Sampling (SANS), an inexpensive negative sampling strategy that utilizes the rich graph structure by selecting negative samples from a node's k-hop neighborhood. Empirically, we demonstrate that SANS finds semantically meaningful negatives and is competitive with SOTA approaches while requires no additional parameters nor difficult adversarial optimization.

machine learning, natural language, neighborhood, (16 more...)

arXiv.org Machine Learning

2009.11355

Country:

North America > Canada > Quebec > Montreal (0.28)
North America > United States (0.04)
Europe > Middle East > Republic of Türkiye > Edirne Province > Edirne (0.04)
Asia > Middle East > Iraq > Kurdistan Region (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.71)

Add feedback

SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach

Yazar, Selcuk

arXiv.org Artificial IntelligenceJul-9-2020

SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach. Abstract Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where SARS-CoV-2 virus records are kept all over the world. Our experimental results have revealed that now the detection of the geographic distribution of the virus with the CNN models might serve as an efficient method. Keywords: Deep Learning, Bioinformatics, Convolutional neural network, SARS-Cov-2, Pattern Classification Introduction Artificial intelligence practices and particularly deep learning studies are a widely used discipline in many research fields, including medicine and bioinformatics. The CNN models, especially in the field of medical imaging, are very successful in lesions and disease diagnosis. In addition to the success of deep learning methods in the fields of image processing, natural language processing, also has a lot of usage on a time scale with approaches such as Long-Short Term memory. In deep learning practices, low-level features such as DNA sequence, pathology images, and tomography scans can be learned from the data, by largely eliminating the need for engineering applications.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.36222/ejt.1094218

2007.05055

Country:

Asia > Middle East > Republic of Türkiye (0.26)
Oceania > Australia (0.06)
Europe > Spain (0.05)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UK to invest £2.6M in drone and satellite tech to deliver vital supplies

Daily Mail - Science & techApr-14-2020, 22:18:23 GMT

The UK government is setting aside £2.6 million for new satellite and drone technology that could deliver essential supplies during the coronavirus lockdown. The UK Space Agency (UKSA) is funding new solutions to deliver equipment such as test kits, masks, gowns and goggles for frontline NHS staff. The joint initiative with the European Space Agency could lead to vital equipment soaring through British skies via drones to support the NHS in tackling COVID-19. Companies can submit their proposals, including ideas for deployment and a pilot phase, on the European Space Agency (ESA) website. The UK's space industry is also looking for ways to combat the spread of coronavirus and preventing future epidemics using satellites.

drone, outbreak, space agency, (12 more...)

Daily Mail - Science & tech

Country:

Europe > Middle East > Republic of Türkiye > Edirne Province > Edirne (0.07)
Europe > Hungary > Jász-Nagykun-Szolnok County > Szolnok (0.07)
Asia > Middle East > Republic of Türkiye (0.07)
(3 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (0.85)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback